Retrieving and Exploiting Lexical Semantics for a Comprehensive Corpus of Mathematical Key{phrases: a Programmatic Paper Preliminary Version
نویسنده
چکیده
In this paper we describe linguistic models and methods, which are being developed for semi{ automatic exploitation and enriching of a large and comprehensive (existing) corpus of English mathematical key{phrases. The starting point is the need for enhanced comparison of key{phrases, meaning, among others, abstraction over intersectivity and movement of modiiers, resolution of prepositional phrase attachment (PPA) ambiguity and indeterminateness of the relations within productive compounds, and isolation of sub{phrases. All this implies the presence of a substantial amount of lexical semantic information (LSI) or LSI tagging. However, no large store of LSI is a priori available. We aim at exploiting the dependencies between the lexical characteristics of the corpus, and the conceptual structure of the domain which is described by the corpus (i.e. mathematics), in order to semi{automatically derive a lexicon in which the lexical entities are modelled to describe the conceptual structure of the domain. We expect such a lexicon to be useful for other types of mathematical texts besides the key{phrases. Important issues in our reasoning will be the kind of LSI that is needed, and creating a model in which limited hand{ tagging forms the base of automatic retrieval of LSI.
منابع مشابه
A Corpus-Based Study of the Lexical Make-up of Applied Linguistics Article Abstracts
This paper reports results from a corpus-based study that explored the frequency of words in the abstracts of applied linguistics journal articles. The abstracts of major articles in leading applied linguists journals, published since 2005 up to November 2001 were analyzed using software modules from the Compleat Lexical Tutor. The output includes a list of the most frequent content words, list...
متن کاملExploiting Context to Identify Lexical Atoms - A Statistical View of Linguistic Context
Interpretation of natural language is inherently context-sensitive. Most words in natural language are ambiguous and their meanings are heavily dependent on the linguistic context in which they are used. The study of lexical semantics can not be separated from the notion of context. This paper takes a contextual approach to lexical semantics and studies the linguistic context of lexical atoms, ...
متن کاملLexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities
This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...
متن کاملCompositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
Speakers of a language can construct an unlimited number of new words through morphological derivation. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. We adapt compositional methods originally developed for phrases to the task of deriving the distributional meaning of morphologically complex word...
متن کاملThe Impact of Teaching Corpus-based Collocation on EFL Learners' Writing Ability
Abstract The present study explores the impact of corpus-based collocation instruction on intermediate Iranian EFL learners' writing ability. For this study, 84 Iranian learners, studying English as a foreign language in Bayan Institute, Iran, were selected and were randomly divided into two groups, experimental and control. Conventional methods of writing instruction were taught to the control...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007